AAAI.2017 - Machine Learning Applications | Cool Papers

#1 Personalized Donor-Recipient Matching for Organ Transplantation [PDF] [Copy] [Kimi]

Authors: Jinsung Yoon ; Ahmed Alaa ; Martin Cadeiras ; Mihaela van der Schaar

Organ transplants can improve the life expectancy and quality of life for the recipient but carry the risk of serious post-operative complications, such as septic shock and organ rejection. The probability of a successful transplant depends in a very subtle fashion on compatibility between the donor and the recipient - but current medical practice is short of domain knowledge regarding the complex nature of recipient-donor compatibility. Hence a data-driven approach for learning compatibility has the potential for significant improvements in match quality. This paper proposes a novel system (ConfidentMatch) that is trained using data from electronic health records. ConfidentMatch predicts the success of an organ transplant (in terms of the 3-year survival rates) on the basis of clinical and demographic traits of the donor and recipient. ConfidentMatch captures the heterogeneity of the donor and recipient traits by optimally dividing the feature space into clusters and constructing different optimal predictive models to each cluster. The system controls the complexity of the learned predictive model in a way that allows for assuring more granular and accurate predictions for a larger number of potential recipient-donor pairs, thereby ensuring that predictions are "personalized" and tailored to individual characteristics to the finest possible granularity. Experiments conducted on the UNOS heart transplant dataset show the superiority of the prognostic value of ConfidentMatch to other competing benchmarks; ConfidentMatch can provide predictions of success with 95% accuracy for 5,489 patients of a total population of 9,620 patients, which corresponds to 410 more patients than the most competitive benchmark algorithm (DeepBoost).

#2 Active Learning with Cross-Class Similarity Transfer [PDF] [Copy] [Kimi]

Authors: Yuchen Guo ; Guiguang Ding ; Yue Gao ; Jungong Han

How to save labeling efforts for training supervised classifiers is an important research topic in machine learning community. Active learning (AL) and transfer learning (TL) are two useful tools to achieve this goal, and their combination, i.e., transfer active learning (T-AL) has also attracted considerable research interest. However, existing T-AL approaches consider to transfer knowledge from a source/auxiliary domain which has the same class labels as the target domain, but ignore the relationship among classes. In this paper, we investigate a more practical setting where the classes in source domain are related/similar to but different from the target domain classes. Specifically, we propose a novel cross-class T-AL approach to simultaneously transfer knowledge from source domain and actively annotate the most informative samples in target domain so that we can train satisfactory classifiers with as few labeled samples as possible. In particular, based on the class-class similarity and sample-sample similarity, we adopt a similarity propagation to find the source domain samples that can well capture the characteristics of a target class and then transfer the similar samples as the (pseudo) labeled data for the target class. In turn, the labeled and transferred samples are used to train classifiers and actively select new samples for annotation. Extensive experiments on three datasets demonstrate that the proposed approach outperforms significantly the state-of-the-art related approaches.

#3 Progressive Prediction of Student Performance in College Programs [PDF] [Copy] [Kimi]

Authors: Jie Xu ; Yuli Han ; Daniel Marcu ; Mihaela van der Schaar

Accurately predicting students' future performance based on their tracked academic records in college programs is crucial for effectively carrying out necessary pedagogical interventions to ensure students' on-time graduation. Although there is a rich literature on predicting student performance in solving problems and studying courses using data-driven approaches, predicting student performance in completing college programs is much less studied and faces new challenges, mainly due to the diversity of courses selected by students and the requirement of continuous tracking and incorporation of students' evolving progresses. In this paper, we develop a novel algorithm that enables progressive prediction of students' performance by adapting ensemble learning techniques and utilizing education-specific domain knowledge. We prove its prediction performance guarantee and show its performance improvement against benchmark algorithms on a real-world student dataset from UCLA.

#4 Robust Manifold Matrix Factorization for Joint Clustering and Feature Extraction [PDF] [Copy] [Kimi]

Authors: Lefei Zhang ; Qian Zhang ; Bo Du ; Dacheng Tao ; Jane You

Low-rank matrix approximation has been widely used for data subspace clustering and feature representation in many computer vision and pattern recognition applications. However, in order to enhance the discriminability, most of the matrix approximation based feature extraction algorithms usually generate the cluster labels by certain clustering algorithm (e.g., the kmeans) and then perform the matrix approximation guided by such label information. In addition, the noises and outliers in the dataset with large reconstruction errors will easily dominate the objective function by the conventional ℓ2-norm based squared residue minimization. In this paper, we propose a novel clustering and feature extraction algorithm based on an unified low-rank matrix factorization framework, which suggests that the observed data matrix can be approximated by the production of projection matrix and low dimensional representation, among which the low-dimensional representation can be approximated by the cluster indicator and latent feature matrix simultaneously. Furthermore, we have proposed using the ℓ2,1-norm and integrating the manifold regularization to further promote the proposed model. A novel Augmented Lagrangian Method (ALM) based procedure is designed to effectively and efficiently seek the optimal solution of the problem. The experimental results in both clustering and feature extraction perspectives demonstrate the superior performance of the proposed method.

#5 Learning with Feature Network and Label Network Simultaneously [PDF] [Copy] [Kimi]

Authors: Yingming Li ; Ming Yang ; Zenglin Xu ; Zhongfei (Mark) Zhang

For many supervised learning problems, limited training samples and incomplete labels are two difficult challenges, which usually lead to degenerated performance on label prediction. To improve the generalization performance, in this paper, we propose Doubly Regularized Multi-Label learning (DRML) by exploiting feature network and label network regularization simultaneously. In more details, the proposed algorithm first constructs a feature network and a label network with marginalized linear denoising autoencoder in data feature set and label set, respectively, and then learns a robust predictor with the feature network and the label network regularization simultaneously. While DRML is a general method for multi-label learning, in the evaluations we focus on the specific application of multi-label text tagging. Extensive evaluations on three benchmark data sets demonstrate that DRML outstands with a superior performance in comparison with some existing multi-label learning methods.

#6 Learning Attributes from the Crowdsourced Relative Labels [PDF] [Copy] [Kimi]

Authors: Tian Tian ; Ning Chen ; Jun Zhu

Finding semantic attributes to describe related concepts is typically a hard problem. The commonly used attributes in most fields are designed by domain experts, which is expensive and time-consuming. In this paper we propose an efficient method to learn human comprehensible attributes with crowdsourcing. We first design an analogical interface to collect relative labels from the crowds. Then we propose a hierarchical Bayesian model, as well as an efficient initialization strategy, to aggregate labels and extract concise attributes. Our experimental results demonstrate promise on discovering diverse and convincing attributes, which significantly improve the performance of the challenging zero-shot learning tasks.

#7 Adverse Drug Reaction Prediction with Symbolic Latent Dirichlet Allocation [PDF] [Copy] [Kimi]

Authors: Cao Xiao ; Ping Zhang ; W. Chaovalitwongse ; Jianying Hu ; Fei Wang

Adverse drug reaction (ADR) is a major burden for patients and healthcare industry. It usually causes preventable hospitalizations and deaths, while associated with a huge amount of cost. Traditional preclinical in vitro safety profiling and clinical safety trials are restricted in terms of small scale, long duration, huge financial costs and limited statistical signifi- cance. The availability of large amounts of drug and ADR data potentially allows ADR predictions during the drugs’ early preclinical stage with data analytics methods to inform more targeted clinical safety tests. Despite their initial success, existing methods have trade-offs among interpretability, predictive power and efficiency. This urges us to explore methods that could have all these strengths and provide practical solutions for real world ADR predictions. We cast the ADR-drug relation structure into a three-layer hierarchical Bayesian model. We interpret each ADR as a symbolic word and apply latent Dirichlet allocation (LDA) to learn topics that may represent certain biochemical mechanism that relates ADRs with drug structures. Based on LDA, we designed an equivalent regularization term to incorporate the hierarchical ADR domain knowledge. Finally, we developed a mixed input model leveraging a fast collapsed Gibbs sampling method that the complexity of each iteration of Gibbs sampling proportional only to the number of positive ADRs. Experiments on real world data show our models achieved higher prediction accuracy and shorter running time than the state-of-the-art alternatives.

#8 Multitask Dyadic Prediction and Its Application in Prediction of Adverse Drug-Drug Interaction [PDF] [Copy] [Kimi]

Authors: Bo Jin ; Haoyu Yang ; Cao Xiao ; Ping Zhang ; Xiaopeng Wei ; Fei Wang

Adverse drug-drug interactions (DDIs) remain a leading cause of morbidity and mortality around the world. Identifying potential DDIs during the drug design process is critical in guiding targeted clinical drug safety testing. Although detection of adverse DDIs is conducted during Phase IV clinical trials, there are still a large number of new DDIs founded by accidents after the drugs were put on market. With the arrival of big data era, more and more pharmaceutical research and development data are becoming available, which provides an invaluable resource for digging insights that can potentially be leveraged in early prediction of DDIs. Many computational approaches have been proposed in recent years for DDI prediction. However, most of them focused on binary prediction (with or without DDI), despite the fact that each DDI is associated with a different type. Predicting the actual DDI type will help us better understand the DDI mechanism and identify proper ways to prevent it. In this paper, we formulate the DDI type prediction problem as a multitask dyadic regression problem, where the prediction of each specific DDI type is treated as a task. Compared with conventional matrix completion approaches which can only impute the missing entries in the DDI matrix, our approach can directly regress those dyadic relationships (DDIs) and thus can be extend to new drugs more easily. We developed an effective proximal gradient method to solve the problem. Evaluation on real world datasets is presented to demonstrate the effectiveness of the proposed approach.

#9 Pairwise Relationship Guided Deep Hashing for Cross-Modal Retrieval [PDF] [Copy] [Kimi]

Authors: Erkun Yang ; Cheng Deng ; Wei Liu ; Xianglong Liu ; Dacheng Tao ; Xinbo Gao

With benefits of low storage cost and fast query speed, cross-modal hashing has received considerable attention recently. However, almost all existing methods on cross-modal hashing cannot obtain powerful hash codes due to directly utilizing hand-crafted features or ignoring heterogeneous correlations across different modalities, which will greatly degrade the retrieval performance. In this paper, we propose a novel deep cross-modal hashing method to generate compact hash codes through an end-to-end deep learning architecture, which can effectively capture the intrinsic relationships between various modalities. Our architecture integrates different types of pairwise constraints to encourage the similarities of the hash codes from an intra-modal view and an inter-modal view, respectively. Moreover, additional decorrelation constraints are introduced to this architecture, thus enhancing the discriminative ability of each hash bit. Extensive experiments show that our proposed method yields state-of-the-art results on two cross-modal retrieval datasets.

#10 Simultaneous Clustering and Ensemble [PDF] [Copy] [Kimi]

Authors: Zhiqiang Tao ; Hongfu Liu ; Yun Fu

Ensemble Clustering (EC) has gained a great deal of attention throughout the fields of data mining and machine learning, since it emerged as an effective and robust clustering framework. Typically, EC methods try to fuse multiple basic partitions (BPs) into a consensus one, of which each BP is obtained by performing traditional clustering method on the same dataset. One promising direction for ensemble clustering is to derive pairwise similarity from BPs, and then transform it as a graph partition problem. However, these graph based methods may suffer from an information loss when computing the similarity between data points, because they only utilize the categorical data provided by multiple BPs, yet neglect rich information from raw features. This problem can badly undermine the underlying cluster structure in the original feature space, and thus degrade the clustering performance. In light of this, we propose a novel Simultaneous Clustering and Ensemble (SCE) framework to alleviate such detrimental effect, which employs the similarity matrix from raw features to enhance the co-association matrix summarized by multiple BPs. Two neat closed-form solutions given by eigenvalue decomposition are provided for SCE. Experiments conducted on 16 real-world datasets demonstrate the effectiveness of the proposed SCE over the traditional clustering and state-of-the-art ensemble clustering methods. Moreover, several impact factors that may affect our method are also explored extensively.

#11 ICU Mortality Prediction: A Classification Algorithm for Imbalanced Datasets [PDF] [Copy] [Kimi]

Authors: Sakyajit Bhattacharya ; Vaibhav Rajan ; Harsh Shrivastava

Determining mortality risk is important for critical decisions in Intensive Care Units (ICU). The need for machine learning models that provide accurate patient-specific prediction of mortality is well recognized. We present a new algorithm for ICU mortality prediction that is designed to address the problem of imbalance, which occurs, in the context of binary classification, when one of the two classes is significantly under--represented in the data. We take a fundamentally new approach in exploiting the class imbalance through a feature transformation such that the transformed features are easier to classify. Hypothesis testing is used for classification with a test statistic that follows the distribution of the difference of two chi-squared random variables, for which there are no analytic expressions and we derive an accurate approximation. Experiments on a benchmark dataset of 4000 ICU patients show that our algorithm surpasses the best competing methods for mortality prediction.

#12 On Predictive Patent Valuation: Forecasting Patent Citations and Their Types [PDF] [Copy] [Kimi]

Authors: Xin Liu ; Junchi Yan ; Shuai Xiao ; Xiangfeng Wang ; Hongyuan Zha ; Stephen Chu

Patents are widely regarded as a proxy for inventive output which is valuable and can be commercialized by various means. Individual patent information such as technology field, classification, claims, application jurisdictions are increasingly available as released by different venues. This work has relied on a long-standing hypothesis that the citation received by a patent is a proxy for knowledge flows or impacts of the patent thus is directly related to patent value. This paper does not fall into the line of intensive existing work that test or apply this hypothesis, rather we aim to address the limitation of using so-far received citations for patent valuation. By devising a point process based patent citation type aware (self-citation and non-self-citation) prediction model which incorporates the various information of a patent, we open up the possibility for performing predictive patent valuation which can be especially useful for newly granted patents with emerging technology. Study on real-world data corroborates the efficacy of our approach. Our initiative may also have policy implications for technology markets, patent systems and all other stakeholders. The code and curated data will be available to the research community.

#13 Unsupervised Deep Learning for Optical Flow Estimation [PDF] [Copy] [Kimi]

Authors: Zhe Ren ; Junchi Yan ; Bingbing Ni ; Bin Liu ; Xiaokang Yang ; Hongyuan Zha

Recent work has shown that optical flow estimation can be formulated as a supervised learning problem. Moreover, convolutional networks have been successfully applied to this task. However, supervised flow learning is obfuscated by the shortage of labeled training data. As a consequence, existing methods have to turn to large synthetic datasets for easily computer generated ground truth. In this work, we explore if a deep network for flow estimation can be trained without supervision. Using image warping by the estimated flow, we devise a simple yet effective unsupervised method for learning optical flow, by directly minimizing photometric consistency. We demonstrate that a flow network can be trained from end-to-end using our unsupervised scheme. In some cases, our results come tantalizingly close to the performance of methods trained with full supervision.

#14 Modeling the Intensity Function of Point Process Via Recurrent Neural Networks [PDF] [Copy] [Kimi]

Authors: Shuai Xiao ; Junchi Yan ; Xiaokang Yang ; Hongyuan Zha ; Stephen Chu

Event sequence, asynchronously generated with random timestamp, is ubiquitous among applications. The precise and arbitrary timestamp can carry important clues about the underlying dynamics, and has lent the event data fundamentally different from the time-series whereby series is indexed with fixed and equal time interval. One expressive mathematical tool for modeling event is point process. The intensity functions of many point processes involve two components: the background and the effect by the history. Due to its inherent spontaneousness, the background can be treated as a time series while the other need to handle the history events. In this paper, we model the background by a Recurrent Neural Network (RNN) with its units aligned with time series indexes while the history effect is modeled by another RNN whose units are aligned with asynchronous events to capture the long-range dynamics. The whole model with event type and timestamp prediction output layers can be trained end-to-end. Our approach takes an RNN perspective to point process, and models its background and history effect. For utility, our method allows a black-box treatment for modeling the intensity which is often a pre-defined parametric form in point processes. Meanwhile end-to-end training opens the venue for reusing existing rich techniques in deep network for point process modeling. We apply our model to the predictive maintenance problem using a log dataset by more than 1000 ATMs from a global bank headquartered in North America.

#15 Event Video Mashup: From Hundreds of Videos to Minutes of Skeleton [PDF] [Copy] [Kimi]

Authors: Lianli Gao ; Peng Wang ; Jingkuan Song ; Zi Huang ; Jie Shao ; Heng Shen

The explosive growth of video content on the Web has been revolutionizing the way people share, exchange and perceive information, such as events. While an individual video usually concerns a specific aspect of an event, the videos that are uploaded by different users at different locations and times can embody different emphasis and compensate each other in describing the event. Combining these videos from different sources together can unveil a more complete picture of the event. Simply concatenating videos together is an intuitive solution, but it may degrade user experience since it is time-consuming and tedious to view those highly redundant, noisy and disorganized content. Therefore, we develop a novel approach, termed event video mashup (EVM), to automatically generate a unified short video from a collection of Web videos to describe the storyline of an event. We propose a submodular based content selection model that embodies both importance and diversity to depict the event from comprehensive aspects in an efficient way. Importantly, the video content is organized temporally and semantically conforming to the event evolution. We evaluate our approach on a real-world YouTube event dataset collected by ourselves. The extensive experimental results demonstrate the effectiveness of the proposed framework.

#16 Collaborative Dynamic Sparse Topic Regression with User Profile Evolution for Item Recommendation [PDF] [Copy] [Kimi]

Authors: Li Gao ; Jia Wu ; Chuan Zhou ; Yue Hu

In many time-aware item recommender systems, modeling the accurate evolution of both user profiles and the contents of items over time is essential. However, most existing methods focus on learning users' dynamic interests, where the contents of items are assumed to be stable over time. They thus fail to capture the dynamic changes in the item's contents. In this paper, we present a novel method CDUE for time-aware item recommendation, which captures the evolution of both user's interests and item's contents information via topic dynamics. Specifically, we propose a dynamic sparse topic model to track the evolution of topics for changes in items' contents over time and adapt a vector autoregressive model to profile users' dynamic interests. The item's topics and user's interests and their evolutions are learned collaboratively and simultaneously into a unified learning framework. Experimental results on two real-world data sets demonstrate the quality and effectiveness of the proposed method and show that our method can be used to make better future recommendations.

#17 Coupling Implicit and Explicit Knowledge for Customer Volume Prediction [PDF] [Copy] [Kimi]

Authors: Jingyuan Wang ; Yating Lin ; Junjie Wu ; Zhong Wang ; Zhang Xiong

Customer volume prediction, which predicts the volume from a customer source to a service place, is a very important technique for location selection, market investigation, and other related applications. Most of traditional methods only make use of partial information for either supervised or unsupervised modeling, which cannot well integrate overall available knowledge. In this paper, we propose a method titled GR-NMF for jointly modeling both implicit correlations hidden inside customer volumes and explicit geographical knowledge via an integrated probabilistic framework. The effectiveness of GR-NMF in coupling all-round knowledge is verified over a real-life outpatient dataset under different scenarios. GR-NMF shows particularly evident advantages to all baselines in location selection with the cold-start challenge.

#18 Portfolio Selection via Subset Resampling [PDF] [Copy] [Kimi]

Authors: Weiwei Shen ; Jun Wang

As the cornerstone of the modern portfolio theory, Markowitz's mean-variance optimization is a major model adopted in portfolio management. However, the estimation errors in its input parameters substantially deteriorate its performance in practice. Specifically, loss could be huge when the number of assets for investment is not much smaller than the sample size of historical data. To hasten the applicability of Markowitz's portfolio optimization to large portfolios, in this paper, we propose a new portfolio strategy via subset resampling. Through resampling subsets of the original large universe of assets, we construct the associated subset portfolios with more accurately estimated parameters without requiring additional data. By aggregating a number of constructed subset portfolios, we attain a well-diversified portfolio of all assets. To investigate its performance, we first analyze its corresponding efficient frontiers by simulation, provide analysis on the hyperparameter selection, and then empirically compare its out-of-sample performance with those of various competing strategies on diversified datasets. Experimental results corroborate that the proposed portfolio strategy has marked superiority in extensive evaluation criteria.

#19 Soft Video Parsing by Label Distribution Learning [PDF] [Copy] [Kimi]

Authors: Xin Geng ; Miaogen Ling

In this paper, we tackle the problem of segmenting out a sequence of actions from videos. The videos contain background and actions which are usually composed of ordered sub-actions. We refer the sub-actions and the background as semantic units. Considering the possible overlap between two adjacent semantic units, we utilize label distributions to annotate the various segments in the video. The label distribution covers a certain number of semantic unit labels, representing the degree to which each label describes the video segment. The mapping from a video segment to its label distribution is then learned by a Label Distribution Learning (LDL) algorithm. Based on the LDL model, a soft video parsing method with segmental regular grammars is proposed to construct a tree structure for the video. Each leaf of the tree stands for a video clip of background or sub-action. The proposed method shows promising results on the THUMOS'14 and MSR-II datasets and its computational complexity is much less than the state-of-the-art method.

#20 Knowing What to Ask: A Bayesian Active Learning Approach to the Surveying Problem [PDF] [Copy] [Kimi]

Authors: Yoad Lewenberg ; Yoram Bachrach ; Ulrich Paquet ; Jeffrey Rosenschein

We examine the surveying problem, where we attempt to predict how a target user is likely to respond to questions by iteratively querying that user, collaboratively based on the responses of a sample set of users. We focus on an active learning approach, where the next question we select to ask the user depends on their responses to the previous questions. We propose a method for solving the problem based on a Bayesian dimensionality reduction technique. We empirically evaluate our method, contrasting it to benchmark approaches based on augmented linear regression, and show that it achieves much better predictive performance, and is much more robust when there is missing data.

#21 FeaBoost: Joint Feature and Label Refinement for Semantic Segmentation [PDF] [Copy] [Kimi]

Authors: Yulei Niu ; Zhiwu Lu ; Songfang Huang ; Xin Gao ; Ji-Rong Wen

We propose a novel approach, called FeaBoost, to image semantic segmentation with only image-level labels taken as weakly-supervised constraints. Our approach is motivated from two evidences: 1) each superpixel can be represented as a linear combination of basic components (e.g., predefined classes); 2) visually similar superpixels have high probability to share the same set of labels, i.e., they tend to have common combination of predefined classes. By taking these two evidences into consideration, semantic segmentation is formulated as joint feature and label refinement over superpixels. Furthermore, we develop an efficient FeaBoost algorithm to solve such optimization problem. Extensive experiments on the MSRC and LabelMe datasets demonstrate the superior performance of our FeaBoost approach in comparison with the state-of-the-art methods, especially when noisy labels are provided for semantic segmentation.

#22 Multidimensional Scaling on Multiple Input Distance Matrices [PDF] [Copy] [Kimi]

Authors: Song Bai ; Xiang Bai ; Longin Jan Latecki ; Qi Tian

Multidimensional Scaling (MDS) is a classic technique that seeks vectorial representations for data points, given the pairwise distances between them. In recent years, data are usually collected from diverse sources or have multiple heterogeneous representations. However, how to do multidimensional scaling on multiple input distance matrices is still unsolved to our best knowledge. In this paper, we first define this new task formally. Then, we propose a new algorithm called Multi-View Multidimensional Scaling (MVMDS) by considering each input distance matrix as one view. The proposed algorithm can learn the weights of views (i.e., distance matrices) automatically by exploring the consensus information and complementary nature of views. Experimental results on synthetic as well as real datasets demonstrate the effectiveness of MVMDS. We hope that our work encourages a wider consideration in many domains where MDS is needed.

#23 Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay [PDF] [Copy] [Kimi]

Authors: Haiyan Yin ; Sinno Pan

The process for transferring knowledge of multiple reinforcement learning policies into a single multi-task policy via distillation technique is known as policy distillation. When policy distillation is under a deep reinforcement learning setting, due to the giant parameter size and the huge state space for each task domain, it requires extensive computational efforts to train the multi-task policy network. In this paper, we propose a new policy distillation architecture for deep reinforcement learning, where we assume that each task uses its task-specific high-level convolutional features as the inputs to the multi-task policy network. Furthermore, we propose a new sampling framework termed hierarchical prioritized experience replay to selectively choose experiences from the replay memories of each task domain to perform learning on the network. With the above two attempts, we aim to accelerate the learning of the multi-task policy network while guaranteeing a good performance. We use Atari 2600 games as testing environment to demonstrate the efficiency and effectiveness of our proposed solution for policy distillation

#24 Neural Programming by Example [PDF] [Copy] [Kimi]

Authors: Chengxun Shu ; Hongyu Zhang

Programming by Example (PBE) targets at automatically inferring a computer program for accomplishing a certain task from sample input and output. In this paper, we propose a deep neural networks (DNN) based PBE model called Neural Programming by Example (NPBE), which can learn from input-output strings and induce programs that solve the string manipulation problems. Our NPBE model has four neural network based components: a string encoder, an input-output analyzer, a program generator, and a symbol selector. We demonstrate the effectiveness of NPBE by training it end-to-end to solve some common string manipulation problems in spreadsheet systems. The results show that our model can induce string manipulation programs effectively. Our work is one step towards teaching DNN to generate computer programs.

#25 Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction [PDF] [Copy] [Kimi]

Authors: Junbo Zhang ; Yu Zheng ; Dekang Qi

Forecasting the flow of crowds is of great importance to traffic management and public safety, and very challenging as it is affected by many complex factors, such as inter-region traffic, events, and weather. We propose a deep-learning-based approach, called ST-ResNet, to collectively forecast the inflow and outflow of crowds in each and every region of a city. We design an end-to-end structure of ST-ResNet based on unique properties of spatio-temporal data. More specifically, we employ the residual neural network framework to model the temporal closeness, period, and trend properties of crowd traffic. For each property, we design a branch of residual convolutional units, each of which models the spatial properties of crowd traffic. ST-ResNet learns to dynamically aggregate the output of the three residual neural networks based on data, assigning different weights to different branches and regions. The aggregation is further combined with external factors, such as weather and day of the week, to predict the final traffic of crowds in each and every region. Experiments on two types of crowd flows in Beijing and New York City (NYC) demonstrate that the proposed ST-ResNet outperforms six well-known methods.